Picture for Gen Li

Gen Li

Gaze2Act: Gaze-Conditioned Vision-Language-Action Policies for Interactive Robot Manipulation

Add code
May 28, 2026
Viaarxiv icon

MARS Policy: Multimodality Only When It Matters

Add code
May 28, 2026
Viaarxiv icon

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning

Add code
May 28, 2026
Viaarxiv icon

C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning

Add code
May 27, 2026
Viaarxiv icon

Mags-RL: Wearing Multimodal LLMs a Magnifying Glass via Agentic Reinforcement Learning For Complex Scene Reasoning

Add code
May 27, 2026
Viaarxiv icon

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

Add code
May 27, 2026
Viaarxiv icon

Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback

Add code
May 25, 2026
Viaarxiv icon

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model

Add code
May 14, 2026
Viaarxiv icon

CFSPMNet: Cross-subject Fourier-guided Spatial-Patch Mamba Network for EEG Motor Imagery Decoding in Stroke Patients

Add code
May 11, 2026
Viaarxiv icon

Seedance 2.0: Advancing Video Generation for World Complexity

Add code
Apr 15, 2026
Viaarxiv icon